Inverse Lyndon words and Inverse Lyndon factorizations of words

نویسندگان

  • Paola Bonizzoni
  • Clelia de Felice
  • Rocco Zaccagnino
  • Rosalba Zizza
چکیده

Motivated by applications to string processing, we introduce variants of the Lyndon factorization called inverse Lyndon factorizations. Their factors, named inverse Lyndon words, are in a class that strictly contains anti-Lyndon words, that is Lyndon words with respect to the inverse lexicographic order. We prove that any nonempty word w admits a canonical inverse Lyndon factorization, named ICFL(w), that maintains the main properties of the Lyndon factorization of w: it can be computed in linear time, it is uniquely determined, it preserves a compatibility property for sorting suffixes. In particular, the compatibility property of ICFL(w) is a consequence of another result: any factor in ICFL(w) is a concatenation of consecutive factors of the Lyndon factorization of w with respect to the inverse lexicographic order. As for the applications, experimental results on biological datasets shown that ICFL(w) combined with the Lyndon factorization is intermediate between the Lyndon factorization and the LZ factorization with respect to the size of the factors. Moreover ICFL(w) allows us to handle too long or too short factors in the Lyndon factorization.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lyndon factorization of the Thue-Morse word and its relatives

Some attention has recently been given to the Lyndon factorization of infinite words [16], [10], [12]. These works are themselves related to the earlier works by Reutenauer [13] and Varricchio [17], concerned with unavoidable regularities and semigroup theory. The results we present here reinforce those in [10] and [12], and give an additional application of the general Lyndon factorization the...

متن کامل

Infinite Smooth Lyndon Words

Motivation Outline Notation Lyndon words Smooth words Result Idea of the proof Case a) Case b) Case c) Case d) Open problems Motivation ◮ Lyndon words : class of words having lexicographical order properties. ◮ Smooth words : class of words, related to the Kolakoski word, that can be easily compressed. ◮ Some infinite smooth words are also Lyndon words.

متن کامل

Primitive Words and Lyndon Words in Automatic and Linearly Recurrent Sequences

We investigate questions related to the presence of primitive words and Lyndon words in automatic and linearly recurrent sequences. We show that the Lyndon factorization of a k-automatic sequence is itself k-automatic. We also show that the function counting the number of primitive factors (resp., Lyndon factors) of length n in a k-automatic sequence is k-regular. Finally, we show that the numb...

متن کامل

Lyndon Words and Singular Factors of Sturmian Words

Two diierent factorizations of the Fibonacci innnite word were given independently in 10] and 6]. In a certain sense, these factorizations reveal a self-similarity property of the Fibonacci word. We rst describe the intimate links between these two factorizations. We then propose a generalization to characteristic sturmian words. R esum e. Deux factorisations du mot de Fibonacci ont et e donn e...

متن کامل

Universal Lyndon Words

A word w over an alphabet Σ is a Lyndon word if there exists an order defined on Σ for which w is lexicographically smaller than all of its conjugates (other than itself). We introduce and study universal Lyndon words, which are words over an n-letter alphabet that have length n! and such that all the conjugates are Lyndon words. We show that universal Lyndon words exist for every n and exhibit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1705.10277  شماره 

صفحات  -

تاریخ انتشار 2017